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ABSTRACT 

We present a new algorithm for detecting transiting extrasolar planets in time-series photometry. The 
Quasiperiodic Automated Transit Search (QATS) algorithm relaxes the usual assumption of strictly periodic 
transits by permitting a variable, but bounded, interval between successive transits. We show that this method 
is capable of detecting transiting planets with significant transit timing variations (TTVs) without any loss of 
significance - "smearing" - as would be incurred with traditional algorithms; however, this is at the cost of an 
slightly-increased stochastic background. The approximate times of transit are standard products of the QATS 
search. Despite the increased flexibility, we show that QATS has a run-time complexity that is comparable to 
traditional search codes and is comparably easy to implement. QATS is applicable to data having a nearly unin- 
terrupted, uniform cadence and is therefore well-suited to the modern class of space-based transit searches (e.g., 
Kepler, CoRoT). Applications of QATS include transiting planets in dynamically active multi-planet systems 
and transiting planets in stellar binary systems. 

Subject headings: stars: planetary systems — techniques: photometric — techniques: numerical 



1. INTRODUCTION 

This paper describes a new algorithm for detecting the tran- 
sits of an extrasolar planet in a time-series of stellar flux obser- 
vations. Many algorithms have been developed and special- 
ized for this exact purpose (e.g., Kovacs et al. (2002), Renner 
et al. (2008), Defay et al. (2001), Aigrain & Favata (2002); 
see Tingley (2003) and Moutou et al. (2005) for a comparison 
of many of these methods). The variety of implementation 
in those algorithms arose primarily from the need to optimize 
transit detection (in the form of increased significance) in re- 
sponse to the nature of the data (e.g., Grziwa et al. (2012), 
Jenkins et al. (2010)), or to speed or automate the detection 
of transiting planets in a large collection of light curve data 
(e.g., Weldrake & Sackett (2005)). These specializations did 
not, however, address the potential degradation of detection 
significance as a result of the non-Keplerian planetary motion 
(exceptions are the algorithms proposed by Ofir (2008) and 
Jenkins et al. (1996) to detect transiting circumbinary plan- 
ets). Specifically, all of these algorithms operate on the as- 
sumption of strict periodicity of the arrival of transits. 

Strict periodicity is not expected when greater than two 
bodies are interacting gravitationally (e.g., in many-planet 
systems or for planets in binary systems). In these cases, we 
would expect non-linearities in the transit times, commonly 
referred to as transit timing variations (TTVs). The detec- 
tion of TTVs can suggest the presence of an unseen perturb- 
ing companion (e.g., Ballard et al. (2011), Nesvorny et al. 
(2012)) and yield information on the masses of the bodies in- 
volved in the gravitational exchange (Agol et al. (2005), Hol- 
man & Murray (2005)). The observation of TTVs in several 
transiting planets in a single system has proved invaluable 
in confirming their planetary nature and, more importantly, 
has placed useful constraints on their masses (Holman et al. 
(2010), Lissauer et al. (2011), Cochran et al. (2011), Carter 
et al. (2012)). Systems of this type are most likely to be de- 
tected by space-based transit surveys (e.g., Kepler, Borucki et 



al. (2010) and CoRoT, Baglin et al. (2006)) given their nearly 
continuous observing modes. 

Garcfa-Melendo & Lopez -Morales (201 1) have shown that 
the failure of traditional transit algorithms to account for 
TTVs would result in reduction of detection significance 
(transit 'smearing'). They showed that the reduction in sig- 
nificance is non-negligible with 'typical' system architectures 
comprised of many interacting planets in a single system. 
In particular, for a transit-timing variation with an amplitude 
<tttv that is longer than the transit duration and having a pe- 
riod shorter than the duration of the data, the signal-to-noise 

is reduced by a factor of \J Ttmnsit I gttv , where T tra „ S i t is the 
transit duration. The transit-timing variation amplitude tends 
to grow in proportion to orbital period, P, while the transit 
duration grows as P 1 / 3 . As such, the reduction is most sig- 
nificant for long period planets that are perturbed by nearby 
companion. 

The detection algorithm described in this paper (the 
Quasiperiodic Automated Transit Search or QATS) presents 
one possible solution to the problem raised by Garcfa- 
Melendo & Lopez-Morales (201 1) by allowing bounded vari- 
ation in the intervals between transits when determining the 
detection significance, accounting for possible TTVs. We 
pose the detection problem in § 2 and derive the algorith- 
mic solution (QATS) in § 3 by specializing and expanding 
the work by Kel'Manov & Jeon (2004) who tackled the more 
general problem of detection and pulse-shape discrimination. 
In § 3.4.1, we estimate the background significance in the 
presence of white noise as a function of the algorithm param- 
eters. In § 4, we provide a worked example of the application 
of the QATS to the Kepler data for Kepler-36 (Carter et al. 
(2012)). 

2. FORMULATION OF THE PROBLEM 

The QATS algorithm is capable of detecting the transits of 
an exoplanet in a series of observations of the relative flux of 
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its host star. 

We require — in order to simplify our discussion and to 
facilitate a fast algorithm — that the N observations of the 
normalized and median removed flux, F(t n ) for < n < N- 1, 
follow a uniform cadence; i.e., t n -t Q oc n. Given this, we may 
uniquely specify the time of any measurement by referring to 
its sequential cadence number n. We use this index exclu- 
sively in what follows as opposed to referring to the absolute 
time and write Fit n ) as F n . 

2. 1. Quasiperiodic transit light curve model 

We assume that the data are completely explained by a tran- 
sit light curve model, L(t n ) = L n , that is contaminated by ad- 
ditive, independent, identically distributed Gaussian noise of 
width er, such that F n = L„ + e with e <~ Af(0, a). Refer to Fig- 
ure 1 for a qualitative representation of the model, described 
next. 

During transit, the light curve decreases for a short amount 
of time when the planet occludes the star. We make the usual 
approximation (e.g., Kovacs et al. (2002)) that the light curve 
profile of a single transit event is boxed-shaped with depth 8 
and duration q. We assume that q is an exact integral num- 
ber of cadences and that the flux deficit starts precisely on a 
cadence. 

Suppose that M transits of fixed shape (fixed q and S) occur 
in the sequence {L n }. Let the collection of indices at the start 
of each of the M transits be 



f]M = {ni,...,n m -\,n m ,...n M -\,n M }- 



(1) 



We may compactly define the model light curve sequence 
{L n } as the sum of flux deficits from all M transits as 



(2) 



m=l 



where we have defined the "root" transit deficit profile as 

{S 0<j<q-l 
1 ] otherwise ' 



(3) 



For a strictly periodic transiting planet, we would expect the 
interval, A„„ between successive transits to obey (for m > 0) 



A m — — P •> 



(Periodic Transit) 



for a constant period P. We have distinguished this period, 
measured as an integral number of cadences, from the abso- 
lute period, P, which is measured in units of time and may be 
fractional. 

We wish to relax the constancy of this interval to allow for 
possible transit timing variations. Fully unspecified intervals 
need not be considered; we expect, based on physical consid- 
erations, that these a priori unknown variations (in interval) 
are small relative to the mean interval. The smallness of this 
variation depends on the specifics of the physical system to 
be detected. For example, the amplitude of this variability is 
<~ 1% of the mean orbital period for known cases of inter- 
acting planets in multi-transiting systems (e.g., Holman et al. 
(2010), Lissauer et al. (2011), Carter et al. (2012)). For tran- 
siting circumbinary planets, the amplitude is a much larger 
fraction: one would expect timing variations comparable to 
the orbital period of the stellar binary; this may amount to 
a variation <~ 10% of the mean orbital period of the planet 
(e.g., Doyle et al. (2011)). In either case, we may specify a 



variable but nearly periodic, or 'quasiperiodic' bounding, be- 
tween successive transits: 

A min < A m = n m -n m -i < A max (Quasiperiodic Transit) 

where A m ; n and A max are the minimum and maximum inter- 
vals considered. Following the above, it is sometimes con- 
venient to parameterize the difference in the maximum and 
minimum interval to be some small fraction, /, of the mini- 
mum interval: A max - A min = /A min . 

In addition to the quasiperiodic bounding, we require that 
the transits be non-overlapping and that the first transit (start- 
ing at cadence n\) and the last transit (starting at cadence %) 
are fully contained within the sequence. These conditions are 
equivalent to 

(4) 
(5) 
(6) 



q < A min 
< «i < A mux -q 
N-A max <n M <N-q. 



Subject to the constraints above, the permitted number of tran- 
sits in L n lies between a minimum (M m ; n > 1) and maximum 
(M max ) where it is simple to show that 



M min = floor 



( N+q- 



V A, 



M max = floor( — ^ ) +1. 

\ ^min / 

2.2. Maximum-likelihood objective function 



(7) 
(8) 



The QATS algorithm is a maximum-likelihood method. In 
other words, QATS determines the set of starting cadences 
(?7M.bestX the transit duration (gbest) and transit depth (<5b es t) that 
best agrees with the data (has the maximum likelihood), sub- 
ject to the quasiperiodic condition. 

We have derived an expression, more suited for numerical 
computation than using the full likelihood, that is also max- 
imized for the optimal parameters. This objective function, 
S(r]M,q), is independent of a and the transit depth when the 
noise is stationary. The details of this derivation have been 
provided in the Appendix. The result is 



S(r)M,q) 



where 



M q-i 

S(vM,q) = ^2^2 -Fj+n,,, ■ 

m=l 7=0 



(9) 



(10) 



The most likely transit depth (also derived in the Appendix) 
is 



<5best - 



S(vM,q) 

Mq 

S(VM,q) 



(11) 
(12) 



S(f]M,q) has a simple physical interpretation when maxi- 
mized: 

S(r]M,q) _ Sbesi fj^j 
a a 

= (S/N) total (13) 

where (S/AOtotai is the total transit signal-to-noise ratio. 



QATS 
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FIG. 1 . — Cartoon representation of the QATS light curve model. Refer to the text in § 2. 1 for a description of the parameters shown here. 



3. THE QATS ALGORITHM 

The previous section introduced an expression, S(rj M ,q) 
(Eqn. 9), that is maximized when the likelihood (Eqn. A3) 
is greatest. The task remains to describe the algorithm that 
actually maximizes S(r] M ,q) with respect to the transit dura- 
tion, q, and the set of starting cadences, t]m for a selected A m i n 
and A max (or A min and fraction f). 

Plausible transit durations may be restricted to a reduced 
range of cadences for a given period subject to simple physi- 
cal expectations (in particular, duration oc P 1 / 3 ). In any case, 
the allowed range of durations may be searched, marginaliz- 
ing with respect to r\ M for each duration, to find the maxi- 
mum likelihood and best-fitting duration. Similarly, the pos- 
sible number of transits in an observation {F n } is enumerated 
from M m ; n and M max , according to Eqns. (7, 8), and may be 
searched explicitly. 

3.1. Dynamic programming solution for maximization over 
r] M by Kel'Manov & Jeon (2004) 

The only non-trivial marginalization of S(i] M ,q) - over r\ M 
for a fixed q and M - is found using a specialization of the 
algorithm by Kel'Manov & Jeon (2004). We review this al- 
gorithm in what follows. 

First, we further distill the objective function: for a 
fixed M and q, S(rj M ,q) is maximized when S(rj M} q) = 
2~2m=i 2~2%o~Fj+n m is maximized. Note that S(i] M ,q) may be 
written as 



S(r) M ,q) = ^D n 



(14) 



m=l 



where 



q-\ 
J=0 



(15) 



D n can be interpreted as the discrete convolution of the data 
with a box of unit depth and duration q (a box "matched- 
filter"). At a transit event, D„ has a symmetric triangular 
profile of width q, peaking at the start of the transit in the 
absence of noise (see Figure 2a)). Written this way, we see 
that the maximum of S(r]M,q) is the maximum of a sum of M 
numbers, drawn from D n with the selected indices subject to 
the quasiperiodic condition. The additive nature of this objec- 
tive function along with the quasiperiodic constraints admits a 
'dynamic programming' solution for the maximization, to be 
described shortly. Dynamic programming typically describes 
a class of algorithms in which calculations made at prior steps 
in an iteration are retained in memory for use in the current 



step 1 . 

Second, we enumerate all possible transit start times. Viz., 
the first transit in the sequence F„, beginning at index m, must 
occur before the maximum interval has passed (according to 
Eq. 5). We consider, in turn, the possible starting indices m in 
this set of possible indices. We refer to this set as uj\. For ex- 
ample, with q = 2, A max = 4, the maximum index the first tran- 
sit can begin at is ni !max = A max -g = 2 such that uj\ = {0, 1,2} 
and «i gui. The second transit, starting at index «2 € cl»2, 
has to occur between A m ; n and A max after m £ u\ and must 
also permit the inclusion of M-2 more transits, all subject 
to the quasi-periodic condition, in the N observations. The 
location of the third, fourth, etc. transit must follow analo- 
gously. Summarily, each transit m of M total transits is re- 
stricted to a set of possible starting indices, uj m , subject to our 
constraints. Kel'Manov & Jeon (2004) showed that the possi- 
ble sets of transit start indices may be exactly specified given 
A min , A max , N, q and M as 



u m = {i:n' m <i<nl} 



(16) 



where 



n" = min 



max[(m- l)A min ,JV- A max (M-m+ 1)] (17) 
[mA max -^,Af-^-(M-m)A min ] . (18) 



It is also useful to consider the subset of cj„,_i that is con- 
ditioned on the knowledge of the starting index of the next 
transit. In particular, if the mth transit is known to occur at 
index n m = n, then subject to the quasiperiodic condition, the 
preceding transit (the (m- l)th transit) must have started at 
index n m _i such that n— A max < n m -\ < n- A mm . We refer to 
the intersection of the set given by this inequality and the set 
u m -\ of allowed transit start times of the (m-l)th transit as 
-f m -i(n). Again, Kel'Manov & Jeon (2004) provide the exact 
expression of this set in closed form: 



j m - l (n) = {i:n'^_ l (n)<i<nZ l (n)} 



where 



, (n) = max [(m - 2) A min , n - A 
n'^Li («) = min[(m - 1 ) A max - q, n 



max] 
A m i n ] 



(19) 

(20) 
(21) 



With the above formalism, we may detail the algorithm by 
Kel'Manov & Jeon (2004). We calculate the maximum of 
S(T]M,q) by considering successive transits (restricted to the 
indices specified by the uj m ) starting with the first. At each 
transit number m and candidate transit start index n e w m , we 
calculate an intermediate quantity S mn = max S(r] m \n m = n;q) 
where rj m = {n\, ...,n m } is the subset of the first m transits in 

' Classic examples of dynamic programming are those used to determine 
sequences defined by a recurrence relation; e.g., the Fibonacci sequence. 
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r\u which is the maximum value of S(r] m ,q) for this circum- 
stance, i.e. excluding the contribution from proceeding tran- 
sits but including the contribution from preceding transits. 
Given this definition and Eq. 14, we may recursively construct 
S mn for all transits m G {1, ...M}: 

Smn=D n + max S m -\; t neiv m (22) 

ie7m-l(») 

S u =D n neuj! (23) 

where we have utilized the set j m -\(n), as defined above, in the 
first line. It is then simple to show that the desired maximum 
of the objective function for fixed q and M is given by 

max S(t] M , q) = max S Mn (24) 



We calculate S mn following the dynamic programming 
method by minimally retaining in memory S m -i t „ for all n <G 
<jj m -\ so as to calculate the mth row of S mn without repeating 
any calculation. In Fig. 2 we show a graphic interpretation 
of the explicit execution of this recurrence, as would be per- 
formed by a computer, for a specific example. 



3.1.1. The most likely indices at the start of the M transits 

The most Jikely set of transit start indices r]M,_ correspond- 
ing to ma\S(i]M 7 q), may be determined once S mn has been 
calculated by traversing this matrix in reverse 2 ^ In detail, the 
index n = n M thatgives the maximum value of Sm,i f° r n <= w m 
(equal to max^ M S(i]M,q) according to Eq. 24) corresponds to 
the most likely starting index of the final (the Mth) transit. 
We may then recursively determine the most likely starting 
indices of the earlier transits conditioned on the knowledge of 
the location of the proceeding transit(s): 

n M = argmax S Mn (25) 

n m = argmax S mn (26) 

«e7„,(«„,+i) 

where argmax ig/ a, is the index i max such that a im!lK is maxi- 
mum for all ; e / and we have made use of the conditional 
set 7 m _i (n), defined in the preceding discussion, in the second 
line. Fig. 3 shows this recursion for the example presented in 
Fig. 2. 



3.2. QATS algorithm, in summary 

Incorporating the solution by Kel'Manov & Jeon (2004), 
we may summarize the QATS algorithm in pseudocode, 
shown as Algorithm 1 . 



2 It is required to retain the entire matrix S nm for this purpose. 



Algorithm 1 QATS algorithm 

Require: F n is the data (normalized and median removed). 

1 : Given A min and A max 

2: Sbesi «— 

3: for all considered transit durations q do 
4: Calculate D n {According to Eq. 15} 

5: for number of transits M = M m j n to M max {with Eqns. 7, 8} do 

6: Calculate S„ m {According to Eqs. 22 and 23} 

7: Si- max„ guJM S M „ 

8: if Sjs/Mq > S tes , then 

9: 5 besl <- S/Vttj 

10: qbest «- q 

11: M best ^M 
12: end if 

13: end for 

14: end for 

15: Sbest i s maximum of objective function with q = <jbest> M = Mb est . 

16: Most likely transit depth is <5 best = 5 best / v / *Cst9best- 

17: Most likely transit start indices, t]m = {«!,...,«*(}, calculated as in 
§3.1.1. 



The most likely set of times may then be determined using 
^best and M best in Eqns. 15, 22, 23 applied to Eqns. 25 and 26. 

3.3. Computational considerations 

The QATS algorithm may be implemented as listed in Al- 
gorithm 1, however, there are some practical considerations. 

The term S mn is most simply realized in code as a N xM 
matrix. This matrix can be memory intensive. For exam- 
ple, a typical Kepler light curve, observed at the nominal 
~ 30 minute cadence and collected through observing quar- 
ter 9 (750 days), contains N «36,000 cadences. A transiting 
planet with orbital period P admits M rts 750 days/P transits 
in that light curve. With 8 bytes allocated per matrix entry, 
we can anticipate requiring w 200 MB • days/P of memory to 
store S mn . Short period orbiting planets (P < 1 day) may re- 
quire disk access, slowing the search dramatically. Addition- 
ally, when searching over multiple periods (see below), it is 
advantageous to conserve shared memory for the purpose of 
parallelization. It is therefore recommended during the calcu- 
lation of Sbest tnat on ly a 2 x M submatrix of S mn be retained 
containing the (m — l)th row and the mth row under calcula- 
tion. The determination of the most likely times still requires 
the full matrix, however, these values are not needed for the 
calculation of the objective function maximum and can be cal- 
culated in a posterior operation. 

The convolved data, D n , as defined in Eqn. 15, may be mod- 
ified to support 'inline' data detrending with more compli- 
cated matched-filters. For example, the filter may include a 
linear correction outside of transit. We also note that the D n 
may be pre-computed and stored in memory for a number of 
common durations prior to the first loop in Algorithm 1 . 

It is likely that some cadences will be missing from the ob- 
servation sequence (for example, those lost in a Kepler light 
curve during data downlink between quarters). Ignoring these 
cadences violates the preconditions of the QATS algorithm 
and may lead to unexpected results. We recommend filling 
these missing cadences with zero values. Note in this case 
that transits occurring during these gaps will provide no addi- 
tional significance to a detection and the most likely times of 
these missing transits will be meaningless. 

3.4. The QATS 'spectrum' 

The QATS algorithm, detailed in Algorithm 1, determines 
the most likely duration and times for a transiting planet with 
'period' specified by the pair of interval bounds, A m i„ and 
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FIG. 2. — Here we show a graphical representation of the step-by-step execution of the QATS algorithm for a noiseless example, a) The 'data' are shown 
in the top time-series, containing three transits of duration q = 2 and of quasperiodic interval. The maximum and minimum intervals (A m j„ and A max ) are 
shown graphically. The box-filter has been applied to the data in the middle time-series (D„, given in Eqn. 15). The grayscale bar represents D„ as well; each 
cell represents a single cadence and is color coded (increasing from white to black) according to the value of D„. b) This panel shows the 'initial' state of the 
yet-to-be-computed matrix S,„„, described in § 3.1. Each row represents an individual transit m (of three total). Each row currently has a copy of D„; only those 
cadences without hashing in their respective cells are relevant during the computation. The allowed set of transit start times ui m (Eqn. 16) is indicated by a label 
and a range for each row m. c) There is nothing to be done for the first row. Here, and for c) through g) we sum the value of D„ and the maximum of the 
significance in the first transit over the possible set of transit start times for the first transit assuming the second transit starts at the indicated location in = 5,6,7. 8 
or 9). These sets are shown explicitly in the first row by the braced range in red (and labelled by ~f m -\ (n) as defined in Eqn. 19). The maxima of those sets have 
been highlighted with a red boundary, h) - 1) We repeat the procedure performed on the second row for the third transit iterating over the possible starts of the 
third transit (n = 10, 11, 12, 13 or 14). I) At this point, the S„ m have been calculated. The maximum significance is then determined as the maximum of the S mn 
with m = 3; that cadence has been highlighted with a blue boundary. 



A max . In practice, one does not know the period of the tran- 
siting planet (nor the variation on that period between tran- 
sits) and multiple periods need be compared for significance. 
Traditional search algorithms establish a grid on period (typ- 
ically being uniform in orbital frequency) and perform the 
fixed period marginalization over duration and phase at each 
grid point. There is no single period in the QATS algorithm, 
however, a number of strategies can be suggested to perform 
the analogous task. We suggest a couple below. 

The first strategy is a uniform grid over minimum interval, 
Amin, with a fixed difference between this minimum interval 
and the maximum interval (such that A max - A m i n = AA). In 
this case, a natural grid spacing is AA > 0. In this strategy, 
the allowed variation in the transit interval (or period) is the 
same for all trial 'periods' and may not reflect the underlying 
physical prior. 

In contrast, the second strategy allows the difference be- 
tween minimum and maximum interval to be some fixed frac- 
tion, /, of the minimum interval (some fraction of the 'pe- 
riod') which may better reflect our theoretical expectations 



(Agol et al. 2005) when the sought after transiting body is be- 
ing perturbed by another body. In this case, we grid A m ; n such 
that the z'th grid point is 

A® n = Al(l+//2)' (27) 
A« x = A« m (l+//2). (28) 

The fraction / may be tuned for a particular search or may 
be searched over a small grid as well; any transit-timing vari- 
ations with a fractional amplitude of f /2 will be contained 
within one of the ranges searched within this grid. 

Both strategies can have coverings that are 'complete' and 
non-overlapping over any interval in A m i n - our grid resolu- 
tion does not affect our ability to detect a given period for a 
strictly periodic transiting planet, excluding signal confusion 
due to a significant stochastic background signal (see § 3.4.1). 

The detection spectrum (i.e., maxS(i]M,q) or the maximum 
total transit signal-to-noise as a function of A m ; n ) has char- 
acteristics similar to traditional transit search spectra that are 
based on a box matched-filter. In particular, the QATS spec- 
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FIG. 3. — A graphical representation of the matrix S,„„ and the determi- 
nation of the most-likely transit start indices for the example represented in 
Fig. 2 according to the procedure described in § 3.1.1. Refer to the caption of 
Fig. 2 for details on this graphic representation; the state of Smn shown here 
is identical to the state in panel / of Fig. 2. 

trum is roughly proportional to (mri)~ x l 2 where m/n is the 
rational number with smallest denominator n that lies be- 
tween [A^JP, A® JP] with P being the actual period of 
the transit. This proportionality is simply explained: suppose 
your trial period is A tr j a i, while the actual period is P with 
Atrial// 1 = m/n where ra and n are mutually prime integers. 
Then every rath transit will be detected, so the signal will be 
reduced by a factor of 1 /ra, while the noise will be changed 
by a factor of yjn/m: the signal-to-noise decreases by a fac- 
tor of \/y/mn. This signal-to-noise is maximized when ran 
is minimized, giving our dependence. The numbers n and 
ra may be efficiently calculated with use of the 'Farey' se- 
quences by finding the lowest order Farey sequence which 
has a fraction, m/n, contained within the trial period search 
range (in units of the correct orbital period); this guarantees 
that m/n is the smallest mutually prime ratio and thus has the 
highest signal-to-noise relative to the peak. For trial periods 
above the correct period, the Farey sequence search should 
take place between [P/A^^f/A^J. Figure 4 shows a plot 
of 1 / ^fmn without the presence of noise for an interval width 
of / = 0.001. In practice, we find that this pattern matches 
the QATS spectrum of detected planet candidates well, while 
other forms of light curve variability show a different shaped 
QATS spectra. We note that the robustness of a detection of a 
transiting planet (or at least a periodic box signal) may be val- 
idated by comparing the measured spectrum to this prediction 
for a specified candidate period. See § 4 for example QATS 
spectra computed from noisy data. 

3.4.1. Stochastic background 

The QATS algorithm permits the detection of transiting 
planets with large variations in transit interval without sup- 
pressing the significance of the transit; however, the ex- 
changeable cost of this freedom is increased backgrounds that 
may 'swamp' a potential detection signal. The stochastic 
background - i.e., the conspiracy of noise that yields a smooth 
level of non-zero significance beneath the sharp transit de- 
tections - grows as the difference between A max and A m ; n 
widens. It is difficult to obtain an exact closed form expres- 
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Trial period / actual period 

FIG. 4. — Noise-free expectation for the shape of the QATS spectrum for 
/= Amax/ A.- 1=0.1%. 

sion for this background; however, in what follows we derive 
an upper bound with some simple assumptions and, further, 
propose formulae that closely approximate the true behavior 
of the background. 

We assume a white-noise background, F„ ~ Af(0, a 2 ); i.e., 
the data are independent, identically distributed Gaussian 
variables with zero mean and variance a 2 . We consider the 
distribution of the maximum of the total transit signal-to-noise 
of this background where, according to Eqns. 13, 9 and 14, 



max(S/A0 toto i, BG = max 



W \J Mqcr 2 



1V1 

= maxy^D„ (Mqcr 2 ) 1/2 

m=l 
M 

=maxVDi M" I/2 



(29) 
(30) 



m= 1 



—1 /2 

Referencing Eqn. 15, we see that the D„ (q<J 2 ) = D' n are 
Gaussian distributed with zero mean and unit variance and are 
correlated whenever \n m —n m \ < q. The requirement that the 
transits be non-overlapping (Eqn. 4) implies that the {D' n } 
are independent. As such, it is simple to see that the sum 

= J2m=i D 'n,„ M ~ 1/2 is distributed identically as the D'„; 
these variables are also correlated as discussed below. 

We may then regard the maximization of Y, m over r\M as 
being approximately equal to the maximum, X, of some ap- 
propriate number of independent draws K from a normal dis- 
tribution with zero mean and unit variance. The number K 
represents the effective number of independent transit config- 
urations, yielding independent sums It can be shown 
(pg. 374, Cramer (1946)) that the mean of the extremum 
variable X is asymptotically approximated for large K as 



v 1 v B 2 v /2Tog7r 



1 



for^T3|31) 



When A max = A m ; n (i.e., periodic transits), the number of 
draws K is equal to the number of independent choices for 
the start time of the first transit as there is no additional free- 
dom in the choices of the remaining M- 1 transits. There 
are K = A max - q + 1 available cadences for the start of the 
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first transit; however, for q ^ 1, these choices are not inde- 
pendent (with correlation length <~ q). We can account for 
this over-counting by dividing K by an effective correlation 
length I = q/a for some a > 1 . Then, the number of indepen- 
dent sums is Aperiodic ~ ceA m . dx /q. Referring to Eqn. 31, for 
large, we have that 



max(S/AO to tai,BG ~ y 21og (aA max /g) . (Periodic Search) 

This analytic theory compares well with simulation (see 
Fig. 5) so long as a = a(q); a = {1.0,2.2,3.0} gives excellent 
agreement with simulated periodic QATS backgrounds when 
q = {1,5, 14}, respectively. Obviously, this dependence also 
applies to singleton transits for any choice of A max > A m i n . 

We may similarly consider the total number of ways to ad- 
mit M transits in F„ for A max > A m i n . As before, the first tran- 
sit may start at A m . ix -q + 1 positions. Following the choice 
of the mth transit, the following (m+ l)th transit must start 
at a choice of A max - A m i n + 1 positions, according to the 
quasiperiodic condition. Applying the rule of products, and 
accounting for the correlation length at each transit, the total 
possible transit configurations is therefore 

^Total ~ ( — ) A max Y[ ( — ) ( A max - A m i n + 1 ) 



m=2 



1 



M 



Amax ( A max A m i n + 1 ) 



M-l 



(32) 



However, it must be the case that max(S/A0totai,BG < (^O-Ktomi 
as the K tot - d \ choices are not independent choices in this prob- 
lem. This is because the contribution to the maximized sum, 
H Vu , is greater for m small than that for m large. We may at- 
tempt to account for this variable contribution by weighting 
the above product by a simple function of transit number m: 

#qats~ (^j A max Y[ (j^j (A m . dX -A min +l)m- 1/r 

= ( A max (A max -A min +l) M - 1 [(M-l)!]- 1 (33) 

where 1 jr is a constant shaping parameter. We then expect 
the background signal in the QATS search to have a depen- 
dence according to max(S/AO to tai.BG = (X)k (Eqn. 31) with 
K = Kq A ts where, to simplify the computation, in the typical 
limit of Kq ATS > 1, 

log Xqats w -M log i + log A max + (M - 1 ) log ( A max - A min + 1 ) 
a 



— [(M-l)log(M-l)-(M-l)] . 



(34) 



In Fig. 5, we plot the analytic prediction for the background 
as a function of A max /g for a selection of A max - A mm , in 
units of duration q, along with simulated backgrounds of the 
same parameters using the QATS algorithm. In this figure, we 
have assumed M w (N/q)/(A msa /q) with N/q = 2500 which 
is reasonable in a QATS search for transits of 7 hour duration 
in a long cadence Kepler light curve spanning 750 days (N = 
35000,^=14). 

We find that r is dependent on A max - A m ; n with r w 1 1 .76, 
7.14 and 6.25 giving the closest agreement between simula- 
tion and theory for A max - A mm = lq,2q and 3q, respectively. 



These choices of r (and those for a = a(q) as provided above) 
have been used in generating the theory curves in Fig. 5. 

The background level may be compared to the detection 
threshold of 7.1 set by the Kepler mission (Jenkins et al. 
(2010)), for example. The background exceeds this thresh- 
old for short periods, A max /q < 40, when A max - A mm = lq. 
For the Kepler-specific example, this corresponds to absolute 
periods of < 10 days with quasiperiodic variability / > 2.5% 
in 750 days of long cadence data. 

The above formulae are useful to gauge the growth of the 
background QATS signal in presence of white noise, however, 
the actual background in a specific search is best represented 
via simulation if possible. 

The starting indices of the most likely transit signal in the 
presence of background alone are best described by a one di- 
mensional constrained random walk whose intervals are ap- 
proximately uniform between the minimum and maximum 
intervals. This characteristic is specific to random noise and 
may be diagnostic of such; we would generically expect phys- 
ical timing variations to have a non-stochastic form. 

3.5. Run-time analysis 

The success of the QATS algorithm is contingent on being 
both precise and efficient in computational time (and secon- 
darily in memory-usage, see § 3.3). In the case of the lat- 
ter, the QATS algorithm performs its most complex computa- 
tional task and the only task_that is distinct from fixed-period 
searches, that of computing S mn (described in § 3. 1), in 'poly- 
nomial' time-complexity. In other words, the number of sim- 
ple computational evaluations on a computer scales as some 
(small) power of the parameters of the problem. In particu- 
lar, for fixed q and M, it can be shown (Kel'Manov & Jeon 
(2004)) that the computation of S mn and the maximization of 
the final row (see § 3.1) has time-complexity 



M(A max -A„ 
q(N-q+l)] 



1 + q)(N-q+l)]M>2 



M= 1 



(35) 



The time to execute a search scales only linearly with the 
(QATS-specific) interval difference A max - A mm . For the most 
typical case of A max -A mm small (see § 2.1), the computa- 
tional penalty of applying the QATS algorithm as opposed to 
a traditional fixed-period search is negligible. 

4. EXAMPLE APPLICATION: KEPLER-36 

In this section, we present a complete application of the 
QATS algorithm on Kepler light curve data for the confirmed 
planetary system Kepler-36 (Carter et al. (2012), planet b also 
detected by Ofir & Dreizler (2012) as KOI-277.02). 

Kepler-36 is a planetary system with two confirmed transit- 
ing planets with orbital periods of approximately 13.84 days 
and 16.23 days. The proximity of these planets' orbits per- 
mits significant gravitational interactions at their closest ap- 
proach which modify their orbits sufficiently to yield signifi- 
cant (many hour) piecewise non-linearities in both their times 
of transit. As may be expected (see, e.g., Ford et al. (2012)), 
the timing non-linearities are anti-correlated between the two 
planets. 

The planets' radii differ by a factor of nearly 2.5 such that 
the transit of the smaller, interior planet b results in a loss of 
stellar flux that is only ss 17% that caused by the transit of the 
more massive planet c; while the transits of planet c are signif- 
icant by eye, those due to b are not. While both planets may be 
detected with a fixed-period search (Ofir & Dreizler (2012)), 
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FIG . 5 . — Stochastic background of the QATS search as a function of maximum search interval (normalized by transit duration) in the presence of 'white' noise. 
The broken lines show the background total transit signal-to-noise significance as determined from a Monte Carlo simulation using the QATS algorithm for a 
selection of transit durations q and A max - A m j n . The solid lines plot theory curves, given by Eqn. 31 with K given as in Eqn. 34, for the parameters described in 
§3.4.1. 



the QATS algorithm produces detections at much higher sig- 
nificance (by factors ~ 1 .7) and provides transit times that can 
be seen immediately to be anti-correlated without the need for 
any further analysis. 

4.1. Preparation of the Kepler data for Kepler-36 

The available Kepler data for Kepler-36 span approxi- 
mately 877 days of nearly uninterrupted observation at the 
29.4 minute long cadence interval (equivalent to 43,053 ca- 
dences). We utilize the 'raw' aperture photometry light curve 
(SAP_AP_FLUX in the Kepler ^zfs product, see Jenkins et al. 
(2010)). A simple moving average of 50 cadences in width 
(« 1 day) was divided through the data to remove astrophysi- 
cal and systematic trends. The detrended light curve was then 
median subtracted and missing cadences were filled with ze- 
roes. The resulting light curve is shown in the top panel of 
Fig. 6; the lower panel shows the light curve after removing 
the readily visible transits of planet c and replacing those tran- 
sit cadences with zeroes. We assume the data have point-to- 
point scatter a = 7.89 x 10~ 5 according to Carter et al. (2012). 

4.2. The detection of planet c 

We first apply the QATS algorithm to the light curve shown 
in the top of Fig. 6, containing the transits of both plan- 
ets b and c. We execute the QATS algorithm outlined in 
Algorithm 1 for each A m ; n ,■ in a grid where A m ; n , = i for 
< i < 43,052 and for three different quasiperiodic con- 
straints Amax,; _ A m i n ; = {0, 1,2}. In other words, we find the 
maximum total transit signal-to-noise, maxS(r]M,q)/cr, for all 
minimum intervals (in integer cadence) plausible for detection 
in the available data. We also fix the transit duration to q = 14, 
close to the duration, measured in cadences, as reported by 
Carter et al. (2012) for planet c. 

We plot in the top of Fig. 7 maxS^M , q) jo as a function of 
A m i n for each choice of quasiperiodic constraint. The profile 
of the QATS spectrum is dominated by the transits of planet 
c and is similar for all three choices (with profile explained 
analytically as described in § 3.4). However, the significance 
of the detection peak (around 794 cadences or « 16.2 days) 
and the level of the stochastic background (see § 3.4.1) in- 
crease with increasing A ma x,! _ A m in,;- In particular, we find 
that the detection significance of the transits of planet c in- 
creases from ss 80 to m 130 with the most dramatic improve- 
ment going from strictly periodic transits (A m ax,; - A m m.; = 0) 
to that with the freedom of single cadence between subse- 
quent transits. This improvement in significance coincides 
with better estimates of the instants of the transits (as shown 



with the progression of 'riverplot' figures in the bottom panel 
of Fig. 7) resulting in less dilution or smearing of the transits 
by the featureless light curve baseline. 

4.3. The detection of planet b 

After removing the transits of planet c (having located their 
starts in the previous QATS application with A maX) ; _ Amin,; = 
2), we perform a subsequent QATS search under the same 
conditions so as to identify the transits of planet b. We plot in 
the top of Fig. 8 the maximum signal-to-noise as a function of 
a (smaller range) of A m in for A ma x./ - A m m,; = {0, 1,2}. The 
peak in this spectrum corresponding to planet b (at A m i n = 
677) is the only dominant feature in this secondary spectrum 
outside of the sloped stochastic background. The background, 
in this case, is appreciable compared to the detection peak, 
again growing with increasing quasiperiodic interval. The de- 
tection peak remains clear (despite this background) growing 
from ss 10 for periodic transits to « 17 for the most liberal 
constraint of A ma x,; _ A mm ,/ = 2. The significance of individ- 
ual transits of planet b are close to 3, making their visual iden- 
tification difficult - even when folded at the best-fitting linear 
ephemeris (see the bottom panels of Fig. 8). However, with 
the guidance of the most-likely transit instants in the final fig- 
ure of the lower panel of Fig. 8, its presence becomes clear. 
This is made even more compelling given that the transit times 
are anti-correlated with those of planet c (compare the final 
figures in Figs. 7 and 8). 

A subsequent execution of the QATS algorithm, having 
now removed the transits of planet b, rendered no additional 
detections exceeding a threshold of 7.1. 

5. SUMMARY 

This paper described the Quasiperiodic Automated Transit 
Search (QATS) algorithm, based upon a specialization of the 
algorithm suggested by Kel'Manov & Jeon (2004). This al- 
gorithm is summarized in pseudocode in Algorithm 1 with 
certain definitions provided in § 3.1. 

This algorithm extends fixed-period box maximum- 
likelihood search methods (e.g., Kovacs et al. (2002)) by al- 
lowing for a variable interval between transits that is bounded 
by user-defined minimum and maximum values. For this rea- 
son, the QATS algorithm is well-suited to the unencumbered 
detection of transiting planets with substantial transit timing 
variations resulting from N-body interactions in a multi-object 
planetary system (Garcfa-Melendo & Lopez-Morales (201 1)). 
Such transiting planets have proven to be invaluable in charac- 
terizing the low mass end of the planetary mass-radius curve 
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FIG. 6. — Kepler data for Kepler-36. The top panel shows the available Kepler light curve data (plotted as a function of cadence number as opposed to time) for 
Kepler-36 after detrending and subtracting the median level (see § 4 for details). Missing cadences have been filled with zero values. The transits of planet c are 
evident; those of planet b are not. The bottom panel shows the same data after replacing the cadences when planet c is in transit with zeroes. 



(e.g., Lissauer et al. (201 1), Carter et al. (2012)). 

The additional computational load of the QATS algorithm 
relative to that of fixed-period searches is negligible for most 
searched conditions. However, in order to produce such an ef- 
ficient algorithm we require certain preconditions on the data 
including a uniform cadence without any missing observa- 
tions. Such conditions are closely approximated for the Ke- 
pler and CoRoT missions and may be reached in any circum- 
stance following some guidelines described in § 3.3. 

The additional freedom of quasiperiodicity comes at the 
cost of increased backgrounds. However, we show, in § 3.4. 1, 
that the growth of this background is sub-linear with increas- 
ing quasiperiodicity and would not likely exceed detection 
threshold limits in most scenarios. 

The algorithm presented here is generic for any quasiperi- 
odic box-like signal and its application may not be restricted 
to transiting exoplanets alone. Going further, altering the box 
filter to match a different pulse shape could extend the al- 
gorithms utility beyond box-shaped pulses; however, this is 
at the expense of the transit-oriented formalism presented in 
this paper. We direct the reader to the more generic work of 
Kel'Manov & Jeon (2004) to begin their specialization.' 

Implementations of the QATS algorithm may be found at 
either URL listed below: 



http : / /www . cf a . harvard .edu/~jacarter/ 
or 

http : / /www .astro. Washington. edu/users/agol/ 
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APPENDIX 

DERIVATION OF THE OBJECTIVE FUNCTION 

The likelihood that the model L n represents the data F„ - which is assumed to be contaminated by additive, Gaussian white 
noise with characteristic width a - is 

£(F„\L n ;r] M ,q,S)= (v^ttct 2 ) exp 
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FIG. 7. — The detection of Kepler-36c. Top. The top panel shows the QATS 'spectrum,' maximum detection significance as a function of minimum interval 
(in cadences) A m j„, for the data shown in the top panel of Fig. 6 for a selection of fixed A ma x — A nl i n . The highest significance peak corresponds to the period 
of planet c (again in cadences; each cadence is 29.4 minutes long, the period of planet c is fti 794 cadences or f» 16.2 days). The profile of the spectrum closely 
resembles that described analytically in § 3.4. The significance of the detection peak increases with increasing A max — A m j„ (as does the background). Bottom 
panels. These 'river plots,' normalized stellar flux as a function of transit number and time modulo the mean orbital period of planet c, correspond to the QATS 
spectra in the top panel. The cadences highlighted in red give the most likely instants of the transit at each transit number according the QATS algorithm. More 
freedom in the quasiperiodic constraint permits more accurate determination of the transit start times (and a subsequent boost to the detection significance). 



We consider the natural logarithm of this likelihood and substitute the definition of L„ from Eqn. 2, 

L(?7, q, 5) = log£(F„ \L„ ; 77, q, 5) 
= -ylog (Ina 1 )- 



(A2) 



1 £7 " 

^2 L F »-L"» 



n=0 



m=\ 



For fixed noise (fixed, finite a), maximization of the likelihood reduces to the minimization of the final braced expression, 
which is independent of a, in the above log-likelihood. Expanding this term, 



AM 



M 



AM 



(A3) 



where 



S z ( n ,q,5) = J2 



n=0 



m=l 



(A4) 



The first term in Eqn. A3 is constant with respect to the parameters, so the maximization of the likelihood is reduced to the 
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FIG. 8. — The detection of Kepler-36b. Refer to the caption of Fig. 7 for a description of the panels. Here, we have applied the QATS algorithm to the data 
shown in the lower panel of Fig. 6, where the transits of planet c have been effectively removed. The most-likely instants of the transits of planet b are clearly 
anticorrelated with those of planet c when A max — A m ; n = 2. 



maximization of S(r), q,6). We may write the second term in S 2 (rj, q, 5) as 



M 


\ 2 MM 


m=1 


J — ^ \ ^ ^ Mn-n m M-n—n, 
/ m=l m'=l 




M M 




= y ' y y U n-n,„ dm.m 
m=l m'=l 




M 




m= 1 



(A5) 



where in the second line we have used the fact that the transits are non-overlapping for m^m'. We may then switch the order of 
summation in S 2 (r), q, S) and let n - n m — > j such that 



S 2 (7 h q,S) = Y, 



2 F i + ">» u j- Yj u ) 



J=-'h„ 



J=-I1„, 



(A6) 



The lower and upper bounds in the sum over /' may be replaced with and q — 1, respectively, as uj is non-zero only within those 
limits (Eqn. 3) and Eqns. 4-6 imply that n m > and N— l-n m > N- 1 -«m > q~ 1; 



M 



S 2 ( V ,q,S) = Y, 



(A7) 
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Applying the definition of uj (Eqn. 3) we find that 

S 2 (r,,q,5) = 25S(r 1 ,q)-Mq5 2 (A8) 



where we have isolated the term that depends on the data, 



m q-\ 

m=l 7=0 



(A9) 



The objective function S(rj,q, 5) is trivially maximized with respect to the depth of transit. The depth at maximum likelihood, 
(5be S t, obeys the algebraic equation 



dS 2 (rj,q,5) 



85 



= 2S(r),q)-2Mq6 best = 



so that 



<5best - 



Mq 



(A10) 



(All) 



As a result, one need not consider different choices of transit depth in the numerical maximization of S(t],q,S) with QATS. We 
only need to consider the marginalized expression over depth, 

S 2 (v,q) = S 2 (ri,q 7 5 he>it ) = 2S best S(r],q)-MqSl est 



-2^-Mq 
Mq H 



S(r),q) 
Mq 



Mq 

and taking a square root, we have our final expression for the object function 

S(r],q) 



S( V ,q)-- 



fMq 



(A12) 



(A13) 
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